[SPARK-12004] Preserve the RDD partitioner through RDD checkpointing #9983

tdas · 2015-11-26T01:17:08Z

The solution is the save the RDD partitioner in a separate file in the RDD checkpoint directory. That is, <checkpoint dir>/_partitioner. In most cases, whether the RDD partitioner was recovered or not, does not affect the correctness, only reduces performance. So this solution makes a best-effort attempt to save and recover the partitioner. If either fails, the checkpointing is not affected. This makes this patch safe and backward compatible.

tdas · 2015-11-26T01:17:26Z

@zsxwing @andrewor14 Can you take a look at this.

tdas · 2015-11-26T01:18:03Z

core/src/main/scala/org/apache/spark/rdd/ReliableRDDCheckpointData.scala

@@ -55,25 +55,7 @@ private[spark] class ReliableRDDCheckpointData[T: ClassTag](@transient private v
   * This is called immediately after the first action invoked on this RDD has completed.
   */
  protected override def doCheckpoint(): CheckpointRDD[T] = {
-


All this code has been moved in the ReliableCheckpointRDD.createCheckpointedRDD

andrewor14 · 2015-11-26T01:28:26Z

core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala

+  /**
+   * Write a RDD partition's data to a checkpoint file.
+   */
+  def writePartitionToCheckpointFile[T: ClassTag](


andrewor14 · 2015-11-26T01:44:15Z

LGTM. Just style and naming nits.

SparkQA · 2015-11-26T05:35:01Z

Test build #46724 has finished for PR 9983 at commit 4dfa265.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2015-12-01T02:37:43Z

jenkins test this please

SparkQA · 2015-12-01T02:45:42Z

Test build #2132 has finished for PR 9983 at commit 4dfa265.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-01T02:59:44Z

Test build #46931 has finished for PR 9983 at commit bf7bebf.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-01T05:13:07Z

Test build #46935 has finished for PR 9983 at commit 9eb7250.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-12-01T18:42:06Z

retest this please

SparkQA · 2015-12-01T20:34:16Z

Test build #46969 has finished for PR 9983 at commit 9eb7250.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-12-01T22:08:40Z

m1.6

The solution is the save the RDD partitioner in a separate file in the RDD checkpoint directory. That is, `<checkpoint dir>/_partitioner`. In most cases, whether the RDD partitioner was recovered or not, does not affect the correctness, only reduces performance. So this solution makes a best-effort attempt to save and recover the partitioner. If either fails, the checkpointing is not affected. This makes this patch safe and backward compatible. Author: Tathagata Das <[email protected]> Closes #9983 from tdas/SPARK-12004. (cherry picked from commit 60b541e) Signed-off-by: Andrew Or <[email protected]>

…ner not present The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004). While #9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected. Author: Tathagata Das <[email protected]> Closes #9988 from tdas/SPARK-11932.

…ner not present The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004). While #9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected. Author: Tathagata Das <[email protected]> Closes #9988 from tdas/SPARK-11932. (cherry picked from commit 5d80d8c) Signed-off-by: Tathagata Das <[email protected]>

Preserve partitioner through RDD checkpointing

5a0a1f9

tdas reviewed Nov 26, 2015
View reviewed changes

Minor update

4dfa265

andrewor14 reviewed Nov 26, 2015
View reviewed changes

tdas mentioned this pull request Nov 26, 2015

[SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present #9988

Closed

Address PR comments

bf7bebf

Fix compiler error

9eb7250

asfgit closed this in 60b541e Dec 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12004] Preserve the RDD partitioner through RDD checkpointing #9983

[SPARK-12004] Preserve the RDD partitioner through RDD checkpointing #9983

tdas commented Nov 26, 2015

tdas commented Nov 26, 2015

tdas Nov 26, 2015

andrewor14 Nov 26, 2015

andrewor14 commented Nov 26, 2015

SparkQA commented Nov 26, 2015

tdas commented Dec 1, 2015

SparkQA commented Dec 1, 2015

SparkQA commented Dec 1, 2015

SparkQA commented Dec 1, 2015

andrewor14 commented Dec 1, 2015

SparkQA commented Dec 1, 2015

andrewor14 commented Dec 1, 2015

[SPARK-12004] Preserve the RDD partitioner through RDD checkpointing #9983

[SPARK-12004] Preserve the RDD partitioner through RDD checkpointing #9983

Conversation

tdas commented Nov 26, 2015

tdas commented Nov 26, 2015

tdas Nov 26, 2015

Choose a reason for hiding this comment

andrewor14 Nov 26, 2015

Choose a reason for hiding this comment

andrewor14 commented Nov 26, 2015

SparkQA commented Nov 26, 2015

tdas commented Dec 1, 2015

SparkQA commented Dec 1, 2015

SparkQA commented Dec 1, 2015

SparkQA commented Dec 1, 2015

andrewor14 commented Dec 1, 2015

SparkQA commented Dec 1, 2015

andrewor14 commented Dec 1, 2015